Personal notes on the smallweb

Posted by pulkomandy on Sun Oct 15 14:10:04 2023  •  Comments (0)  • 

Lately I have taken part in some fediverse discussions about the "small web". As you probably know, many websites these days are bloated. They will load several megabytes of resources. I remember surfing the web as a kid, I would download some games and shareware, and I would be annoyed when one of them would be larger than 1MB, as that would take a while to download. Now, a lot of webpage (just the webpage, not the downloaded software!) are already 10 times larger than that.

Yet, they do not seem to offer much more than they did 25 years ago. A web page mainly shows text with some pictures where relevantm or just to make it look nice. So, what happened? Well, I don't really know. If anything, my website is getting simpler and smaller over the years. Early versions had MIDI music and a Java applet emulating the Start Wars opening scroll on the landing page.

Anyway, so, the "small web". It seems this idea started gaining traction in the last few years. It comes from multiple directions: retrocomputing is trendy, and, well, it always was, but now, there are decidedly retro machines that had their commercial life during the early days of the worldwide web. So there is demand from retrocomputer users who want to browse the web like in 1996 or so. Another nostalgic aspect is the "what if?" exploration of Gopher, an early competitor to the web that largely lost to it. And there is also the concerns caused, I guess, indirectly by climate change, of building a more sustainable tech that's less power hungry. Surely, simpler websites and web browsers could be part of that, as well as being genuinely useful to people who have a limited internet access for various reasons (living in very remote areas, not willing or being able to pay for a super fast connection, or being in a place where the infrastructure just isn't there for many possible reasons).

One thing that is somewhat succesful in that domain is the Gemini protocol. This is inspired by Gopher but made more modern and a little less quirky, for example, it requires SSL where Gopher was mainly plaintext. Unlike HTML, it gives very limited control on the text formatting to people writing webpages. As a result, it is quite easy to write a Gemini browser, even one working in a command-line, and indeed there are quite a few of them. I had only a brief look at it, and decided it doesn't work for me for various reasons (this is my personal reaction to it, not a complaint or so, I'm sure other people are happy with these choices and that's good for them). The SSL requirement makes it unreachable for the 8-bit retrocomputers I care about. The limited formatting is often not enough for what I want to do with the web, and likewise, I find the lack of cookies, URL parameters and the like a bit of an extreme measure. I understand the idea of enforcing privacy, but I am not convinced it is worth the limitations that result.

But, my main problem with it is: it is new technology. There is a new protocol, a new page formatting language, new specifications, and new software to write. I get it, a lot of people will find these new things exciting, it's fun to write software and specs from scratch, without having to care about legacy quirks and past mistakes. This is not how I usually think about things. As I learnt in school, a good software developer is a lazy software developer. So, my approach is often trying to make most use of what's already available. In this case, this means the existing web browsers. That's a good idea because we already have a lot of software, a lot of people who know how to write webpages, and a good understanding of the problems and limitations. And there are also users (people browsing the web) who won't need to be convinced to install yet another piece of software on their computers. Can we achieve that? I don't know, but I find that idea a lot more interesting to explore than the "what if we erased everything we have and started over?" approach of Gemini.

So, I took a look at the existing specifications for the web. My goal here is to see how to make websites that are "light" (a webpage should be a dozen to a hundred kilobytes or so), fast to load, and yet allow websites to have some visual personality. The latter can't really be done in Gemini and I think it's an important part of the web: allowing everyone to have their own space, and really make it reflect who they are. There will be the wall of text just using the default formatting. The cute page with colorful text and animated GIFs. The red-text-on-black-background crazy conspiracy theory website. And so on.

From the technical side, let's try to limit ourselves to the technologies already widely supported by web browsers. Or, to word it differently: I don't think the bloat of the web is a technical problem to be solved by throwing more tech at it. Instead, it's just a matter of making better and more efficient use of what we already have. The core set of technologies is in fact quite reasonable. On the protocol side, we have a choice of at least HTTP 1.0 and 1.1, possibly HTTP 2, possibly some other more experimental protocols. I'm not sure HTTP 2 complexity is really useful for such small websites, 1.1 will probably be good enough.

More interesting is the content format. Here, we can review the different available options. The oldest one is HTML 3, which does not have CSS and mixes up content and presentation. This is not great by today's standards and does not really make things simpler. The newest one is HTML 5, which is in a kind of "rolling release" mode and keeps changing all the time. Personally I find this to be a problem, I would prefer a fixed set of features.

So, that leaves us with HTML 4. Which turns out to be really interesting, because it is from a time where mobile phones were starting to go online, and so, there was a simultaneous demand for richer webpages on desktop machines, and really simple ones for then limited mobile phones and other similar devices. Also, it is from a time where the web attempted to move to XML with XHTML 4. This seems to still be controversial: XHTML is a much stricter syntax, while HTML 4 retains the more flexible way of working of HTML3 and later HTML versions. Basically, whatever you write in a text document, a web browser can parse it as HTML. It is very tolerant on unclosed tags, weird nesting of things (putting a paragraph inside a table inside a list item), and so on. XHTML, on the other hand, is allowed to reject a page as invalid XML. Well, it doesn't really matter anyway, existing browsers already accept both, so we can use both depending on the context. If you are writing a static website generator, you can probably make it generate valid XHTML. If you are manually editing webpages or allowing your users to do so, maybe HTML will provide a less frustrating experience? I'm not sure about that, in my early days of writing HTML and Javascript I think I would have preferred to have clear error messages instead of a browser that always did something, but rarely exactly what I wanted.

Anyway, with both XHTML and HTML4, one interesting aspect that the w3c worked on is modularity. You can pick a profile like XHTML Basic, which gives a set of recommendations for which tags should be used or not. The selected set seems reasonable for my use of the web, it does not seem very constraining to me. Likewise for CSS, you can easily decide to limit yourself to "level 1" or "level 2" features. Or at least, you can make sure your website "degrades" to an acceptable rendering on browsers that support only these, while making use of more advanced features on browsers that can handle it.

Finally, we have to talk about javascript. Javascript is not a great language. We know why that is: it was designed in a very short time with requirements along the line of "Java is trendy, but objecto riented programming is too complicated, can you make us a version of Java without objects? We need it in two weeks". As you expect, the result is not great, and the language did get objects later on anyways. Well, unfortunately, we don't really have a choice. So, my approach to this is to limit Javascript usage to the minimum possible, and make sure my websites work well on browsers that don't parse Javascript. But I still find it useful, for example for client-side pre-validation of forms. It is useful to have immediate feedback on required fields in a form, and not discover problems only after you have submitted it. That kind of thing.

Also, Javascript allows to set up things like "AJAX" (OK, no one calls it this way anymore), basically, you can have dynamic webpages where just one part of the page is reloaded or generated by Javascript code from a loaded network resource. Sure, this makes the browser quite a bit more complex, and the webpage will need some processing. But it doesn't necessarily have to turn into megabytes of bloat. In fact, if used well, this can be very efficient in terms of bandwidth, since the whole webpage does not need to be reloaded. So, I don't like the idea of banning Javascript. Just make its use minimal, and, where it makes sense, have a fallback for browsers without Javascript.

Finally, one thing that is a bit unused in the current web is RSS feeds. Or, more broadly, the idea to expose website data in a format that's easy to parse and process by applications. I thnk this is a path that deserves to be explored more. Not only RSS, but also more generally exposing any type of data in a well structured XML, and maybe using XSLT to process it into other formats. This is well supported in most web browsers, and certainly could get more use. I like the idea of websites exposing data as XML that can also be easily downloaded and processed elsewhere (by some other website, or by a "native" app on the user's computer). This is kind of a tangent to the small web, but maybe there's an opportunity to build something new and exciting here.

So, what will I do, then? Well, not much. This website should be XTML-1.0 basic compatible and uses no Javascript, because it doesn't need any more than that. It does use some modern CSS features, but even with CSS disabled, you can read the text in the articles and use the website in an acceptable way. I hope this inspires other people to do the same. Maybe if more and more websites are built like this, the bloat of the big modern web3 ones will stand out more, and people will notice and complain about it?

Leave a comment

Name: Mail: